Phylogenetic and population genetic analyses of Thrips tabaci Lindeman (Thysanoptera: Thripidae) on Allium host in India

Background Onion thrips (Thrips tabaci) is a complex of cryptic species with subtle morphological differences and distinct genetic backgrounds; thus, species identification using traditional methods remains challenging. The existence of different haplotypes and genotypes within a species can significantly influence various aspects of its biology, including host preference, reproductive capacity, resistance to pesticides, and vector competence for plant viruses. Understanding the genetic diversity and population structure of cryptic species within T. tabaci will not only aid in the development of more effective control strategies tailored to specific genetic variants but also in monitoring population dynamics, tracking invasive species, and implementing quarantine measures to prevent the spread of economically damaging thrips biotypes. Methods This study aims to explore intraspecies genetic diversity and molecular evolutionary relationships of the mitochondrial cytochrome oxidase gene subunit I (mtCOI) in T. tabaci populations from India. To capture diversity within the Indian T. tabaci populations, amplicon sequencing was performed for the thrips mtCOI gene from eight diverse localities in India. A total of 48 sequences retrieved for the mtCOI gene from the NCBI Nucleotide database were analysed. Results Multiple insertions and deletions were detected at various genomic positions across the populations from different localities, with the highest variation observed in the 300–400 genome position range. Molecular diversity analyses identified 30 haplotypes within the population, with certain subpopulations exhibiting higher gene flow. Analysis of single nucleotide polymorphism patterns within the mtCOI gene across diverse Indian locales revealed significant intrapopulation genetic heterogeneity and its potential repercussions on gene functionality. Elevated F statistics (Fst) values in the northern–western subpopulations suggested high genetic variability, particularly evident in haplotype networks originating mainly from the northern region, notably Delhi. While most populations displayed stable and ancient evolutionary histories, thrips populations from northern, western, and north-eastern regions indicated rapid growth.


INTRODUCTION
Thrips is a widespread polyphagous insect pest from the order Thysanoptera.Globally, several species of thrips cause serious damage to crops (Mouden et al., 2017;Stuart, Gao & Lei, 2011;Raut et al., 2020).Among these, the onion thrips, Thrips tabaci Lind, is an important pest that costs significant economic losses to various crops every year (De Kogel et al., 2015).Although many thrips have been reported in Alliums, T. tabaci is most commonly observed (Porta et al., 2023).In addition to direct damage to the crop, this insect serves as a vector of two devastating viruses, Iris Yellow Spot Virus (IYSV) and Tomato Spotted Wilt Virus (TSWV) (Komondy et al., 2024;Nachappa et al., 2020).Moreover, damage due to feeding by T. tabaci increases susceptibility to infections by other pathogens in onion (Leach et al., 2019;Grode et al., 2017).T. tabaci is a complex of genetically differentiated and widespread insect pests that are adapted to diverse climatic conditions (Brunner et al., 2004;Gill et al., 2015).The genetic variation within this species positively influences their adaptability under diverse climatic conditions.This genetic variation is also linked to their reproductive modes, such as thelytoky, arrhenotoky, and deuterotoky (Kobayashi & Hasegawa, 2012;Nault et al., 2006;Gawande et al., 2017;Porta et al., 2023), as well as vector competence (Jacobson & Kennedy, 2013).
To study the genetic variation in thrips, various genes have been targeted and analysed for phylogenetic analyses and species identification (Buckman, Mound & Whiting, 2013;Rebijith et al., 2013).DNA barcoding is a novel system of species identification; it uses the mitochondrial cytochrome C oxidase subunit 1 gene (cox1 or COI) as a standardized single molecular marker for the identification, comparison, and categorisation of animal species (Hebert et al., 2003).The DNA barcoding method has a prerequisite of linking species identification to a voucher from a curated biological collection.This enables follow-up and the development of a method for verifying species identification.
In Thysanopetra, targeting mitochondrial genome is the most suitable approach for molecular identification as well as species recognition irrespective of life stages, sex, and polymorphism (Asokan et al., 2007), identifying cryptic species (Glover et al., 2010;Rebijith et al., 2013), biotypes (Shufran et al., 2000), and haplotypes and host-and geographic-associated genetic differences (Rebijith et al., 2013;Brunner et al., 2004).Three distinct lineages (T-Tobacco associated, L1 and L2-Leek associated) in T. tabaci based on mitochondrial data were proposed (Brunner et al., 2004;Kobayashi, Yoshimura & Hasegawa, 2013;Jacobson et al., 2016).Using mitochondrial cytochrome oxidase I (mtCOI) sequences, microsatellite markers and vector competence were documented in accordance with the transmission efficiency of viruses in thrips population (Jacobson & Kennedy, 2013).DNA barcoding using mtCOI gene for species-level identification has received widespread support as an additional technique to resolve taxonomic ambiguities.According to the International Barcode of Life Consortium, genetic variation below the level of a species has been estimated using mitochondrial genes.Its importance in quick and reliable species identification has been widely acknowledged in a numerous taxa worldwide (International Barcode of Life Consortium, 2023).This method has also been applied for thrips identification, diversity analysis (Bravo-Pérez et al., 2018), phylogenetic analysis (Gawande et al., 2017), population structure (Porta et al., 2023), and invasive genetics (Tyagi et al., 2017).
In India, phylogenetic analyses based on the mtCOI gene were carried out, revealing different biotypes of thrips on different hosts and exhibiting differences in their mtCOI gene (Asokan et al., 2007).The existence of heteroplasmy was identified in T. tabaci by analysing mtCOI haplotypes from different geographic regions in India (Gawande et al., 2017).However, an exhaustive analysis of genetic diversity and population genetic variation of T. tabaci in India is yet to be conducted.Its correlation with data from other countries also needs to be studied.No contemporary Indian research has reported population genetics analysis on T. tabaci.Despite extensive studies on thrips genotyping using mitochondrial mtCOI markers, the extent of crypticness within T. tabaci remains unknown.With the advent of next-generation sequencing, it has become easier to characterize the level of variation through amplicon sequencing of the mtCOI gene in a population.This technique allows for the processing and characterization of a large amount of sequence data, simplifying the capture of sequence variations within a single gene fragment.Herein, we aimed to unravel the genetic variation and structure of T. tabaci on Allium hosts from different geographic locations across India using amplicon sequencing of the mtCOI gene.We characterized the extent of genetic cryptic nature of the onion-infesting T. tabaci population in India using advanced bioinformatics tools.

Ethics statement
Thrips tabaci sampling from Allium host did not involve any endangered species.

Specimen collection
T. tabaci adults were collected from Allium hosts at 10 locations across different climatic regions in India from 2017 to 2019, as outlined in Table 1.The sampling sites were chosen to encompass maximum agro-climatic zones where onions are widely cultivated.T. tabaci identification was confirmed based on taxonomic characteristics such as grey ocellar pigmentation, brown antennal segments III-IV with the basal half pale, forewings being pale, and 7-segmented antennae.Collected specimens were preserved in 95% ethanol and stored at −20 • C prior to DNA extraction.

PCR amplification of a COI fragment and amplicon sequencing
Thrips were removed from the ethanol stock by using a fine brush.Preserved thrips were dried for 5 min and total genomic DNA was extracted using a DNeasy Blood & Tissue Kit (Qiagen GmbH, Hilden, Germany) from 50 adult thrips of each location, as per protocol described by the manufacturer.A 525-bp fragment of the mitochondrial cytochrome oxidase I was amplified using the primer pair COI-1F TGTTACGGCTCACGCTTTTG and

Single nucleotide polymorphism analysis
Single nucleotide polymorphism (SNP) analysis started with a quality assessment of the raw amplicon sequence data by using FASTQC (https://github.com/s-andrews/FastQC/releases/tag/v0.12.1), ensuring data integrity for further analysis.Sequences were aligned using Bowtie2 (https://bowtie-bio.sourceforge.net/bowtie2/index.shtml),with a specific isolate as the reference sequence (KF724977.1),establishing a standard for variation detection.SNP calling was performed using BCFtools mpileup (https: //samtools.github.io/bcftools/bcftools.html),meticulously identifying SNPs by analyzing sequence read quality and coverage depth.This analysis unveiled disparity in the variable sites between two datasets, A and B, which had 40 and 13 variable sites, respectively; a higher divergence rate was observed in dataset B despite shorter sequence lengths.This suggested that sequence length influences the perceived genetic variation, highlighting the intricate relationship between genetic sequence characteristics and evolutionary dynamics.

Preparing a dataset for genetic diversity analyses
To elucidate genetic diversity among Indian species, an analytical study was conducted focusing on mtCOI gene sequences of Indian species retrieved from the NCBI-GenBank database.Acknowledging the inherent variability in sequence lengths-a reflection of genetic diversity-the dataset was categorised into two groups based on nucleotide length for refined computational analysis.Group A contained 25 sequences, each up to 516   2 and 3, respectively).This division was strategic, catering to the computational demands posed by sequence length variability.
The MAFFT (https://mafft.cbrc.jp/alignment/server/)tool was used for multiple sequence alignment (MSA), and high-quality alignment was achieved by optimizing sequence homology and minimizing alignment gaps.Thereafter, genetic distance was calculated using the uncorrected p-distance model in MEGA-X, providing a direct measure of genetic divergence without presupposing any evolutionary model.Phylogenetic relationships among the sequences were deciphered using IQ-TREE software (Trifinopoulos et al., 2016), employing the maximum likelihood method with the Jukes-Cantor model (Tamura et al., 2013) of nucleotide substitution.This method is crucial for determining the most probable evolutionary relationships among the species.Further, genetic diversity was analyzed using DNASP v.5 (Librado & Rozas, 2009) and Arlequin 3.5 (Excoffier & Lischer, 2010) tools, aimed at evaluating genetic variation and structure within and between populations.

Sequencing data
Data generated from amplicon sequencing were used for phylogenetic analysis.The sequence produced by Illumina sequencing was used to identify the homologs of the DOGR isolates by using NCBI BLAST.The results obtained from BLAST were considered to allocate the closest homologs of DOGR new isolate.These homologs were used for the phylogenetic analysis.Statistics of a raw amplicon sequencing are presented in Table 4.The highest and lowest number of reads were generated in samples from Sikkim (247,362) and Tamil Nadu (204,698), respectively.GC content was 33.6-34.5%.More than 84% reads secured a Phred score of 30, indicating high-quality data.The Phred quality score numerically expresses the accuracy of each nucleotide; the higher the Q number, the greater is the accuracy of data.Raw sequencing data were submitted to the NCBI SRA database (PRJNA917098).

Genomic distribution of SNPs
We obtained partial (420 bp) mtCOI sequences for T. tabaci sampled from Allium hosts from eight climatic regions of India.SNPs within the partial COI gene differentiated T. tabaci populations into 30 mtDNA haplotypes in total, with 17 and 13 haplotypes in groups A and B, respectively.Locality-wise percent variation was calculated from the total reads recorded (Fig. 1).307_Palampur represented the highest nucleotide polymorphism, followed by 306_Tamilnadu, 305_Maharashtra, 303_Haryana, and 302_Gujarat.Compared to other localities, a low level of mitochondrial genetic variation was observed in 304_Delhi isolates.We examined the frequency and distribution of SNPs at different locations in India.Numerous insertions and deletions of varying lengths were found in the genomic positions of different localities.For each locality, the highest and lowest variations were observed at the genome positions 300-400 and 150-250, respectively.Thus, the range of diversity at nucleotide level reported within mtCOI gene was 1-420 bp, which was used herein via high throughput sequencing (Fig. 2A).The top three SNPs in each 50-bp interval of mtCOI gene sequence data are shown in Fig. 2B.More detailed representation of SNP distribution in the mtCOI sequences of different samples is presented in Fig. S1.

Molecular phylogenetic analysis
Phylogenetic trees confirmed that molecular diversity in the same geographical region is possible if the environmental conditions are favourable.Molecular evolution might not be affected by geographical proximity in various subpopulations because sequences have significantly different clusters (i.e., they are not clustered as per their geographical locations).Among phylogenetic trees, most sequences formed individual nodes, suggesting highly diverse populations.As shown in Fig. 3, northern and north-eastern (Sikkim strain) subpopulations showed similarities because they formed one clade, which is isolated from the central region of India (Madhya Pradesh); similarly, the southern subpopulation (Karnataka) formed one clade, indicating the presence of a uniformly evolving population.Because of diversity in the north-eastern subpopulation, they were considered two differently grouped nodes.Similarly, strains from eastern India (Odisha and West Bengal as well as Bihar) were clustered differently, despite close geographical locations.
As shown in Fig. 4, newly sequenced isolates were closely related, irrespective of geographical locations, indicating stabilization of the thrips population (302_Gujrat, 305_Maharashtra, 309_Tripura, 306_Tamilnadu).Strains from West Bengal showed uniform compositions because they were grouped with different geographical locations,

Genetic diversity analysis
The study of polymorphism in DNA sequences is a prevailing method for elucidating genetic diversity and functional relevance of these genetic variations.Genetic diversity parameters such as number of polymorphic sites (S), number of haplotypes (H), haplotype diversity (Hd), nucleotide diversity (Pi), and average number of nucleotide differences (K) were determined using DnaSP 5.12.01 (Librado & Rozas, 2009).Sequence analysis of Group A suggested that the population is stable with long evolutionary history and high Hd and Pi values (Hd > 0.5 and Pi > 0.005) (Table 5).Eastern populations of groups A and B as well as the northern population of Group A showed similar characteristics.Southern population of Group A had high Hd and low Pi values (Hd > 0.5 and Pi < 0.005), representing population bottleneck followed by rapid growth in population File S1).Western population of Group A showed recent population bottleneck or very few mtDNA lineages, as they had low Hd and Pi values (Hd < 0.5 and Pi < 0.005) (Grant & Bowen, 1998).
Higher Nm values indicating high gene flow were observed more frequently across populations in Group A than in Group B. North-eastern and eastern Indian subpopulations with respect to northern and southern subpopulations from Group A and western population with northern subpopulation from Group B had Nm values <1, indicating  and northern subpopulations, along with north-eastern subpopulations in Group A, indicated a low degree of differentiation among both subpopulations.Haplotype networks were constructed using the TCS algorithm in the population analysis with reticulate trees (POPART) (Leigh & Bryant, 2015).The POPART haplotype networks are illustrated in Figs. 5 and 6.Group A comprised eight haplotypes, with 307_Himachal Pradesh and 303_Haryana as the most distant haplotypes.Group B had seven haplotypes; the most distant haplotype was from Tripura.Some states had more haplotypes, such as Maharashtra (KF724977.1_Maharashtra),Delhi (MN594551.1_Delhi),and West Bengal; this may be an indication of population stabilization or successful niche formation in the species of these regions.Further research with larger datasets is necessary to examine the diversity and molecular evolution in these geographical regions.

DISCUSSION
T. tabaci, commonly known as onion thrips, represents a complex of cryptic species with diverse genetic backgrounds.These cryptic species exhibit subtle morphological differences, making it challenging to use traditional identification methods.Characterizing crypticness within T. tabaci is crucial because of the potential implications for agriculture and pest management.The existence of different haplotypes and genotypes within a species can significantly influence various aspects of its biology, including host preference, reproductive capacity, resistance to pesticides, and vector competence for plant viruses (Gawande et al., 2017;Jacobson & Kennedy, 2013;Li et al., 2014;Toda & Morishita, 2009).Understanding the genetic diversity and population structure of cryptic species within T. tabaci can aid the development of more effective control strategies tailored to specific genetic variants.Furthermore, accurate identification of cryptic species is essential for monitoring population dynamics, tracking invasive species, and implementing quarantine measures to prevent the spread of economically damaging thrips biotypes.In this study, we attempted to characterize the crypticness within Indian T. tabaci population based on mtCOI gene sequencing for elucidating its evolutionary history.We determined that significant differences across the selected localities may be attributed to a combination of factors, namely geographical isolation, migration and gene flow, ecological variation, historical factors, and host plant associations.
The phylogenetic tree generated in our research showed a high degree of diversity, with new and a large number of gene sequences, suggesting that Indian thrips isolates continue to evolve and diversify.Higher intraspecific genetic variations in mtCOI gene of T. tabaci have been reported in economically important vegetables and crops host from different countries (Kadirvel et al., 2013;Li et al., 2020).
Nucleotide sequences of mtCOI gene were used for population genetics analysis of onion thrips from eight regions in India.This helped characterise species diversity, geographical distribution of a species, and tracing the phylogeographic origin (Lombaert et al., 2014 Porta et al., 2023).In contemporary literature, only 28 haplotypes of mtCOI gene of T. tabaci have been reported worldwide, including Asia, Europe, Australia, America and other 23 countries, and these haplotypes were retrieved from nearly 282 nucleotide sequences (Iftikhar et al., 2016).This study investigated SNPs within the partial COI gene and differentiated T. tabaci populations into 30 mtDNA haplotypes, with 17 and 13 haplotypes in groups A and B, respectively.This indicated higher genetic variation or diversity of haplotypes in the Indian population of T. tabaci.The mtCOI region of thrips from Palampur represented the highest nucleotide polymorphism, followed by Tamil Nadu, Maharashtra, Haryana, and Gujarat.The high diversity in these locations may be attributed to climatic conditions and ecological factors such as local flora and fauna.Mitochondrial genetic variation observed in Delhi isolates was found to be lower than those in isolates from other regions.This can be attributed to several factors associated with the urban landscape of the region.Unlike other localities included in the study, Delhi's urban environment lacks the floral and fauna diversity that could influence the genetic makeup of the local T. tabaci population.A comprehensive analysis of SNP patterns within the mtCOI gene across diverse Indian locales revealed significant intrapopulation genetic heterogeneity and its potential repercussions on gene functionality.The SNP at the nucleotide position 63 (thymine to adenine transition) demonstrated remarkable allelic variation, indicating substantial genetic divergence among Indian cohorts.This phenomenon, in parallel with analogous allelic fluctuations at nucleotide positions 7 (cytosine to adenine transversion) and 399 (thymine to guanine transversion), accentuates the intricate mosaic of regional genetic variability.The notable allelic diversity at specific loci, particularly at position 63 in Rajgurunagar, Maharashtra, and position 7 in Palampur, Himachal, and Lembucherra, Tripura, suggest the possible adaptive evolutionary mechanisms in response to local environmental or nutritional selective pressures.These loci represent critical sites for probing the evolutionary trajectory and functional integrity of the COI gene.Distinct allelic distributions in regions such as Palampur, Himachal, and Coimbatore, Tamil Nadu, could mirror unique historical migration patterns or adaptive responses.Furthermore, the pronounced allelic variability at selected sites highlights the possibility of consequential impacts on COI gene transcriptional activity or protein efficacy, potentially affecting mitochondrial operational efficiency and predisposition to mitochondrial disorders.Nonetheless, loci exhibiting minimal allelic variation might reflect a shared ancestral genetic foundation, underscoring the dynamic balance between genetic diversification and evolutionary preservation.
Our study represents a comprehensive sequence data analysis with respect to the aspects of molecular phylogeny and genetic diversity.To achieve the computational results of genetic diversity, groups A and B were formed considering variations in the length of the mtCOI gene of onion thrips.High Nm values indicating high gene flow were observed more frequently across all populations in Group A than in Group B, indicating that these groups had recent population bottlenecks with varying Hd and Pi values (Grant & Bowen, 1998).Small Hd and Pi values (0.5 and 0.005, respectively) indicated a recent population bottleneck or founder event by a single or a few mtDNA lineages; high Hd (>0.5) and low Pi values (0.005) indicated a population bottleneck followed by rapid population increase and mutation accumulation; low Hd (0.5) and high Pi (>0.005) values represented geographically subdivided populations; high Hd (0.5) and Pi values (>0.005) indicated a large stable population with a long evolutionary history or secondary contact between distinct lineages (Grant & Bowen, 1998).This helped us determine that genetic drift might lead to changes in diversity and the number of haplotypes (Balloux, Lehmann & de Meeûs, 2003).T. tabaci populations from the central and southern regions of India exhibited high levels of genetic differentiation and diversity compared with populations from other regions, whereas the population from the north-eastern region showed low genetic flow.Geographical isolation and absence of recent gene flow suggested that genetic drift played an important role in genetic differentiation.The findings of this study extend our knowledge on diversity of T. tabaci in India and provide insights for management practices to mitigate its growth.
Pairwise Fst and gene flow (Nm) among different zones, namely west, south, east, and north regions of Indian subcontinent, were calculated.Significant differences were observed between groups A and B, where Group A comprising populations from the southern and western parts of India and Group B, comprising populations from the southern parts, showed high Nm values.Nm < 1 reflected low levels of gene flow and ultimately higher genetic variation such as in north-eastern and northern parts (Tiroesele et al., 2014).Varying levels of gene flow may be attributed to the geographical and genetic distance.Fst values were <0.5 for majority of the samples, which is a relative measure of genetic variation that revealed overall similarity.For Group A, Fst was <0.5, suggesting a low level of genetic variability.
Haplotype networks generated in this study suggested that diverse haplotypes exist in Delhi and Maharashtra.These results are also consistent with those of molecular phylogeny, which indicate that geographical proximity cannot be used to measure molecular evolutionary patterns in mtCOI.Recent investigations in T. tabaci populations inhabiting on Alliums did not determine any correlations between the genetic and geographic distances (Li et al., 2020;Porta et al., 2023).The haplotype networks showed that Group A comprised eight haplotypes, with 307_Himachal Pradesh and 303_Haryana as the most distant haplotypes.In Group B with seven haplotypes, the most distant haplotype was from Tripura.Some states represented more haplotypes, like Maharashtra (KF724977.1_Maharashtra),Delhi (MN594551.1_Delhi),and West Bengal, which may be an indication of population stabilization or successful niche formation by the species.
Studies have reported links between traits, reproductive modes, and distribution of genetic variation from different countries, where, in particular, the mode of reproduction of thrips population affects their diversity (Westmore et al., 2013;Gawande et al., 2017;Jacobson & Kennedy, 2013;Nault, Kain & Wang, 2014;Jacobson et al., 2016).In this study, we could not highlight the presence of arrhenotokous and thelytokous populations, which could have helped rule out low or high genetic diversity to further explore Indian thrips.

CONCLUSIONS
To the best of our knowledge, this study marks the first attempt to explore the genetic structure of T. tabaci populations.Amplicon sequencing of the mtCOI gene was used for the analysis.Our investigation reaffirmed the previous findings that T. tabaci isolates in the Indian subcontinent exhibit significant diversity.
Our findings unveil multiple insertions and deletions occurring at various genomic positions across different localities, with the highest degree of variation observed within the 300-400 genome position range.Analysis of SNP patterns within the mtCOI gene across diverse Indian locales revealed significant intrapopulation genetic heterogeneity and its potential repercussions on gene functionality.Molecular diversity analyses identified a total of 30 haplotypes within the population, with noticeable gene flow occurring between specific subpopulations.Higher Fst values within northern-western subpopulations indicated considerable genetic variability, particularly evident in haplotype networks predominantly originating from the northern region, notably Delhi.While most populations exhibited stable and ancient evolutionary histories, the thrips populations in northern-western and north-eastern India seemed to indicate rapid growth.

Figure 2 Figure 3
Figure 1 Locality wise percent reads with variations.

Figure 5
Figure 5 Haplotype network visualized in POPART for group A. Numbers in bracket on the edges signify number of mutations and KF724977.1-Maharashtrarepresents predominant haplotype.Network has 8 haplotypes.Full-size DOI: 10.7717/peerj.17679/fig-5

Table 2 Distribution of Group A.
nucleotides in length, whereas Group B comprised 22 sequences, each having a length of 675 nucleotides (Tables

Table 5 Genetic diversity analysis of group A and B.
B low gene flow and potentially higher genetic differences in these subpopulations within specific geographical territories.Fst values were <0.5, implying overall similarity between various subpopulations of groups A and B. The low Fst values among western, southern, Gawai et al. (2024), PeerJ, DOI 10.7717/peerj.1767910/19